Missing Data and Imputation

Authors

Javier Estrada

Michael Underwood

Elizabeth Subject-Scott

Published

April 10, 2023

Website

Slides

Introduction

Missing Data

Missing data occurs when there are missing values in a dataset. There are many reasons why this occurs. It can be intentional or unintentional and can be classified into the following three categories, otherwise known as missingness mechanisms (Mainzer et al. 2023):

  • Missing completely at random (MCAR) is the probability of missing data being completely independent of any other variables.

  • Missing at random (MAR) is the probability of missing data being related to the observed values.

  • Missing not at random (MNAR) is the probability of missing data being dependent on the missing and observed values.

Figure 1: Graphical Representation of Missingness Mechanisms (Schafer and Graham 2002)

(X are the completely observed variables. Y are the partly missing variables. Z is the component of the cause of missingness unrelated to X and Y. R is the missingness.)

Looking for patterns in the missing data can help us to determine which category they belong. These mechanisms are important in determining how to handle the missing data. MCAR would be the best case scenario but seldom occur. MAR and MNAR are more common.

The problem with ignoring any missing values is that it does not give a true representation of the dataset and can lead to bias when analyzing. This reduces the statistical power of the analysis (van_Ginkel et al. 2020). To enhance the quality of the research, the following should be followed: explicitly acknowledge missing data problems and the conditions under which they occur and employ principled methods to handle the missing data (Dong and Peng 2013).

Methods to Deal with Missing Data

There are three types of methods to deal with missing data, the likelihood and Bayesian method, weighting methods, or imputation methods (Cao et al. 2021). Missing data can also be handled by simply deleting.

  • Likelihood Bayesian method is when information from a previous predictive distribution is combined with evidence obtained in a sample to predict a value. It requires technical coding and advanced statistical knowledge.

  • The weighting method is a traditional approach when weights from available data are used to adjust for non-response in a survey. Inefficiency occurs when there are extreme weights or a need for many weights.

  • The imputation method is when an estimate from the original dataset is used to estimate the missing value. There are two types of imputation: single and multiple.

Deleting missing data

Listwise deletion is when the entire observation is removed from the dataset. Deleting missing data can lead to the loss of important information regarding your dataset and is therefore not recommended. In certain cases, when the amount of missing data is small and the type is MCAR, listwise deletion can be used. There usually won’t be bias but potentially important information may be lost.

T-tests and chi-square tests can be used to assess pairs of predictor variables to determine whether the groups’ means differ significantly. According to (van_Ginkel et al. 2020), if significant, the null hypothesis is rejected, therefore, indicating that the missing values are not randomly scattered throughout the data. This implies that the missing data is MAR or MNAR. Conversely, if nonsignificant, this implies that the data cannot be MAR. This does not eliminate the possibility that it is not MNAR–other information about the population is needed to determine this.

Whenever missing data is categorized as MAR or MNAR, listwise deletion would be wasteful, and the analysis biased. Alternate methods of dealing with the missing data is recommended: either pairwise deletion or imputation.

Pairwise deletion is when only the missing variable of an observation is removed. It allows more data to be analyzed than listwise deletion but limits the ability to make inferences of the total sample. For this reason, it is recommended to use imputation to properly deal with missing data.

Preferred Method to Handle Missing Data

Imputation is the preferred method to handle missing data. It consists of replacing missing data with an estimate obtained from the original, available data. After imputation, there will be a full dataset to analyze. To improve statistical power, the number of imputations created should be at least equal to the percent of missing data (5% equals 5 imputations, 10% equals 10 imputations, 20% equals 20 imputations, etc.) (Pedersen et al. 2017). According to (Wulff and Jeppesen 2017), 3-5 imputations are sufficient, and 10 are more than enough.

Single, or univariate, imputation is when only one estimate is used to replace the missing data. Methods of single imputation include using the mean, the last observation carried forward, and random imputation. The following is a brief explanation of each:

  • Using the mean to replace a missing value is a straight-forward process. The mean of the dataset is calculated, including the missing value. The mean is then multiplied by the number of observations in the study. Next, the known values are subtracted from the product, and this gives an estimate that can be used for any missing values. The problem with this method is that it reduces the variance which leads to a smaller confidence interval.

  • Last Observation Carried Forward (LOCF) is a technique of replacing a missing value in longitudinal studies with a previously observed value (the most recent value is carried forward) (Streiner 2008). The problem with this method is that it assumes that the previous observed value is perpetual when in reality that most likely is not the case.

  • Random imputation is a method of randomly drawing an observation and using that observation for any of the missing values. The problem with this method is that it introduces additional variability.

These single imputation methods are flawed. They often result in underestimation of standard errors or too small p-values (Dong and Peng 2013), which can cause bias in the analysis. Therefore, multiple imputation is the better method because it handles missing data better and provides less biased results.

Multiple, or multivariate, imputation is when various estimates are used to replace the missing data by creating multiple datasets from versions of the original dataset. It can be done by using a regression model, or a sequence of regression models, such as linear, logistic and Poison. A set of m plausible values are generated for each unobserved data point, resulting in M complete data sets (Dong and Peng 2013). The new values are randomly drawn from predictive distributions either through joint modeling (JM, which is not used much anymore) or fully conditional specification (FCS) (Wongkamthong and Akande 2023). It is then analyzed and the results are combined to obtain a single value for the missing data.

The purpose of multiple imputation is to create a pool of imputed data for analysis, but if the pooled results are lacking, then multiple imputation should not be done (Mainzer et al. 2023). Another reason not to use multiple imputation is if there are very few missing values; there may be no benefit in using it. Also worth noting is some statistical analyses software already have built-in features to deal with missing data.

Multiple imputation by chained methods, otherwise known as MICE, is the most common and preferred, method of multiple imputation (Wulff and Jeppesen 2017). It provides a more reliable way to analyze data with missing values. For this reason, this paper will focus on the methodology and application of the MICE process.

Code
#loading packages
library(DiagrammeR)

Figure 2: Flowchart of the MICE-process based on procedures proposed by Rubin (Wulff and Jeppesen 2017)

Code
DiagrammeR::grViz("digraph {

# initiate graph
graph [layout = dot, rankdir = LR, label = 'The MICE-Process\n\n',labelloc = t, fontcolor = DarkSlateBlue, fontsize = 45]

# global node settings
node [shape = rectangle, style = filled, fillcolor = AliceBlue, fontcolor = DarkSlateBlue, fontsize = 35]
bgcolor = none

# label nodes
incomplete [label =  'Incomplete data set']
imputed1 [label = 'Imputed \n data set 1']
estimates1 [label = 'Estimates from \n analysis 1']
rubin [label = 'Rubin rules', shape = diamond]
combined [label = 'Combined results']
imputed2 [label = 'Imputed \n data set 2']
estimates2 [label = 'Estimates from \n analysis 2']
imputedm [label = 'Imputed \n data set m']
estimatesm [label = 'Estimates from \n anaalysis m']


# edge definitions with the node IDs
incomplete -> imputed1 [arrowhead = vee, color = DarkSlateBlue]
imputed1 -> estimates1 [arrowhead = vee, color = DarkSlateBlue]
estimates1 -> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputed2 [arrowhead = vee, color = DarkSlateBlue]
imputed2 -> estimates2 [arrowhead = vee, color = DarkSlateBlue]
estimates2-> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputedm [arrowhead = vee, color = DarkSlateBlue]
imputedm -> estimatesm [arrowhead = vee, color = DarkSlateBlue]
estimatesm -> rubin [arrowhead = vee, color = DarkSlateBlue]
rubin -> combined [arrowhead = vee, color = DarkSlateBlue]
}")

*Rubin’s Rules: Average the estimates across m estimates. Calculate the standard errors and variance of m estimates. Combine using an adjustment term (1+1/m).

Other Methods of Imputation

There are other methods of imputation worth noting and are briefly descrbied below.

Regression Imputation is based on a linear regression model. Missing data is randomly drawn from a conditional distribution when variables are continuous and from a logistic regression model when they are categorical (van_Ginkel et al. 2020).

Predictive Mean Matching is also based on a linear regression model. The approach is the same as regression imputation when it comes to categorical missing values but different for continuous variables. Instead of random draws from a conditional distribution, missing values are based on predicted values of the outcome variable (van_Ginkel et al. 2020).

Hot Deck (HD) imputation is when a missing value is replaced by an observed response of a similar unit, also known as the donor. It can be either random or deterministic (based on a metric or value) (Thongsri and Samart 2022). It does not rely on model fitting.

Stochastic Regression (SR) Imputation is an extension of regression imputation. The process is the same but a residual term from the normal distribution of the regression of the predictor outcome is added to the imputed value (Thongsri and Samart 2022). This maintains the variability of the data.

Random Forest (RF) Imputation is based on machine learning algorithms. Missing values are first replaced with the mean or mode of that particular variable and then the dataset is split into a training set and a prediction set (Thongsri and Samart 2022). The missing values are then replaced with predictions from these sets. This type of imputation can be used on continuous or categorical variables with complex interactions.

Methodology

Multiple Imputation by Chained Equations (MICE)

In multiple imputation, m imputed values are created for each of the missing data and result in M complete datasets. For each of the M datasets, an estimate of \(\theta\) is acquired.

Combined estimator of \(\theta\) is given by:

\({\hat{\theta}}_{M}\)=\(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M} {\hat{\theta}}_{m}\)

The proposed variance estimator of \({\hat{\theta}}_{M}\) is given by:

\({\hat{\Phi}}_{M}\) = \({\overline{\phi}}_{M}\)+(1+\(\displaystyle \frac{1}{M}\))B\(_{M}\)

where \({\overline{\phi}}_{M}\) = \(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M}\)\({\hat{\phi}}_m\)

and B\(_{M}\) = \(\displaystyle \frac{1}{M-1}\)\(\sum_{m = 1}^{M}\)(\({\hat{\theta}}_{m}\)-\({\overline{\theta}}_{M}\))\(^{2}\)

(Arnab 2017)

The chained equation process has the following steps (Azur et al. 2011):

Step 1:

Using simple imputation, replace the missing data with this value, referred to as the “place holder”.

Step 2:

The “place holder” values for one variable are set back to missing.

Step 3:

The observed values from this variable (dependent variable) are regressed on the other variables (independent variables) in the model, using the same assumptions when performing linear, logistic, or Poison regression.

Step 4:

The missing values are replaced with predictions “m” from this newly created model.

Step 5:

Repeat Steps 2-4 for each variable that have missing values until all missing values have been replaced.

Step 6:

Repeat Steps 2-4, updating imputations each cycle for as many “m” cycles/imputations that are required.

Analysis and Results

Data and Visualizations

Load Data and Packages
Code
# load data
credit = read.csv("credit_data.csv")

# load libraries
library(gtsummary)
library(dplyr, warn.conflicts=FALSE)
library(mice, warn.conflicts=FALSE)
Description of Dataset

Credit score data

Details of Dataset

The credit.csv file is from the website of Dr. Lluís A. Belanche Muñoz, by way of a github repository of Dr. Gaston Sanchez. It contains data of 4,454 subjects and stores a combination of continuous, categorical and count values for 15 variables. Of the 15 variables, the “Status” variable contains binomial categorical values of “good” and “bad” to describe the kind of credit score each subject has. One data point is missing an outcome and was removed from the original data.

Definition of Data in Dataset
Variable Type Description
X Integer Count variable indicating the number of subjects.
Status Character 2-level categorical variable indicating the status of the subject’s credit: good or bad.
Seniority Integer Count variable indicating the seniority a subject has accumulated over the course of their life.
Home Character 6-level categorical variable indicating the subject’s relationship to their residential address: rent, owner, parents, priv, other, or ignore.
Time Integer Count variable showing how many months has elapsed since the subject’s payment deadline without paying their debt full.
Age Integer Count variable indicating subject’s age (in years).
Marital Character 5-level categorical variable indicating the subject’s marital status: single, married, separated, divorced, or widow.
Records Character 2-level categorical variable indicating whether the subject has a credit history record: yes or no.
Job Character 4-level categorical variable indicating the type of job the subject has: fixed, freelance, partime, or others.
Expenses Integer Count variable indicating the amount of expenses (in USD) a subject has.
Income Integer Count variable indicating the amount of income (in thousands of USD) a subject earns annually.
Assets Integer Count variable indicating the amount of assets (in USD) a subject has.
Debt Integer Count variable indicating the amount of debt (in USD) a subject has.
Amount Integer Count variable indicating the amount of money (in USD) remaining in a subject’s bank account.
Price Integer Count variable indicating the amount of money a subject earns by the end of the month.
Summary of Dataset:
Code
credit %>%
  tbl_summary(by = Status,
              missing_text = "NA") %>%
  add_p() %>%
  add_n() %>%
  add_overall %>%
  modify_header(label ~ "**Variable**") %>%
  modify_caption("**Summary of Credit Data**") %>%
  bold_labels()
Summary of Credit Data
Variable N Overall, N = 4,4541 bad, N = 1,2541 good, N = 3,2001 p-value2
X 4,454 2,228 (1,114, 3,341) 2,222 (1,142, 3,366) 2,232 (1,098, 3,326) 0.3
Seniority 4,454 5 (2, 12) 2 (1, 6) 7 (2, 14) <0.001
Home 4,448 <0.001
    ignore 20 (0.4%) 9 (0.7%) 11 (0.3%)
    other 319 (7.2%) 146 (12%) 173 (5.4%)
    owner 2,107 (47%) 390 (31%) 1,717 (54%)
    parents 783 (18%) 233 (19%) 550 (17%)
    priv 246 (5.5%) 84 (6.7%) 162 (5.1%)
    rent 973 (22%) 388 (31%) 585 (18%)
    NA 6 4 2
Time 4,454 48 (36, 60) 48 (36, 60) 48 (36, 60) <0.001
Age 4,454 36 (28, 45) 34 (27, 42) 36 (28, 46) <0.001
Marital 4,453 <0.001
    divorced 38 (0.9%) 14 (1.1%) 24 (0.8%)
    married 3,241 (73%) 829 (66%) 2,412 (75%)
    separated 130 (2.9%) 64 (5.1%) 66 (2.1%)
    single 977 (22%) 328 (26%) 649 (20%)
    widow 67 (1.5%) 19 (1.5%) 48 (1.5%)
    NA 1 0 1
Records 4,454 773 (17%) 429 (34%) 344 (11%) <0.001
Job 4,452 <0.001
    fixed 2,805 (63%) 580 (46%) 2,225 (70%)
    freelance 1,024 (23%) 333 (27%) 691 (22%)
    others 171 (3.8%) 68 (5.4%) 103 (3.2%)
    partime 452 (10%) 271 (22%) 181 (5.7%)
    NA 2 2 0
Expenses 4,454 51 (35, 72) 49 (35, 75) 52 (35, 68) 0.8
Income 4,073 125 (90, 170) 100 (74, 148) 130 (100, 178) <0.001
    NA 381 217 164
Assets 4,407 3,000 (0, 6,000) 0 (0, 4,000) 4,000 (0, 7,000) <0.001
    NA 47 20 27
Debt 4,436 0 (0, 0) 0 (0, 0) 0 (0, 0) 0.3
    NA 18 13 5
Amount 4,454 1,000 (700, 1,300) 1,100 (800, 1,415) 1,000 (700, 1,250) <0.001
Price 4,454 1,400 (1,117, 1,692) 1,423 (1,062, 1,728) 1,400 (1,134, 1,678) >0.9
1 Median (IQR); n (%)
2 Wilcoxon rank sum test; Pearson's Chi-squared test
Evaluate Dataset

First, we evaluate the dataset for missing values. As indicated in the table, the data does contain NA/missing values. We can create a table that shows each variable and how many missing values they have:

Code
# Shows which variables have missing values and how many
colSums(is.na(credit))
        X    Status Seniority      Home      Time       Age   Marital   Records 
        0         0         0         6         0         0         1         0 
      Job  Expenses    Income    Assets      Debt    Amount     Price 
        2         0       381        47        18         0         0 

We now must analyze the data to see how we intend to handle the missing values. In order to do this, we need to create a new dataset, called new_credit, that deletes the missing data. We want to perserve the original dataset so we can implement the method we intend to use to address the missing values. We can then generate a count of rows to determine how many values were deleted in total.

Code
# Creates a new dataset excluding missing values 
new_credit = na.omit(credit)

# Number of rows of new dataset
nrow(new_credit)
[1] 4039

We started out with 4,454 rows and our new dataset has 4,039. 415 rows were deleted due to the missing data. To run regression, we would be throwing away 9.3% of our data, because of missingness. Instead, we can use multiple imputation to impute the missing values so that we don’t have to discard such valuable information.

MICE in R

Using the MICE (Multivariate Imputation by Chained Equations) package in R, a statistical programming software, we will create multiple datasets with imputed values for the missing values. Because our dataset contains just under 10% of missing data, we will generate 10 imputations, or 10 new datasets. The MICE package seamlessly does this by creating plausable values from other columns and places them into the intersections of rows and columns with missing data.

First step is to check the missingness by looking for patterns in the original dataset using the md.pattern() function:

Code
credit <- credit[-c(1)]
md.pattern(credit, rotate.names = TRUE)

     Status Seniority Time Age Records Expenses Amount Price Marital Job Home
4039      1         1    1   1       1        1      1     1       1   1    1
366       1         1    1   1       1        1      1     1       1   1    1
22        1         1    1   1       1        1      1     1       1   1    1
7         1         1    1   1       1        1      1     1       1   1    1
8         1         1    1   1       1        1      1     1       1   1    1
4         1         1    1   1       1        1      1     1       1   1    1
3         1         1    1   1       1        1      1     1       1   1    0
2         1         1    1   1       1        1      1     1       1   1    0
1         1         1    1   1       1        1      1     1       1   0    1
1         1         1    1   1       1        1      1     1       1   0    0
1         1         1    1   1       1        1      1     1       0   1    1
          0         0    0   0       0        0      0     0       1   2    6
     Debt Assets Income    
4039    1      1      1   0
366     1      1      0   1
22      1      0      1   1
7       1      0      0   2
8       0      0      1   2
4       0      0      0   3
3       0      0      1   3
2       0      0      0   4
1       1      1      0   2
1       0      0      0   5
1       1      1      1   1
       18     47    381 455

Blue is observed values and red is missing values. There are 11 patterns.

In order to perform multiple imputation on categorical data, all string variables must be converted to factors using the as.factor() function (van_Buuren 2011):

Code
credit$Status = as.factor(credit$Status)
credit$Home = as.factor(credit$Home)
credit$Marital = as.factor(credit$Marital)
credit$Records = as.factor(credit$Records)
credit$Job = as.factor(credit$Job)

Using the mice() function, 10 multiple imputations for the missing values will be generated. The default is 5, so you must set m = to the number of imputations that you desire. Since the data type of the variables in the dataset are of both numerical and categorical nature (with 2 and more levels), the defaultMethod argument will contain pmm: predictive mean matching (numeric data); logreg: logistic regression imputation (binary data, factor with 2 levels); polyreg: polytomous regression imputation for unordered categorical data (factor > 2 levels); polr: proportional odds model for (ordered, > 2 levels). The set.seed will be given the value 1337 (any number can be used here) to retrieve the same results each time the multiple imputation is performed.

Code
Multiple_Imputation = mice(data = credit,  maxit = 10, m = 10, defaultMethod = c("pmm", "logreg", "polyreg", "polr"), set.seed = 1337)

 iter imp variable
  1   1  Home  Marital  Job  Income  Assets  Debt
  1   2  Home  Marital  Job  Income  Assets  Debt
  1   3  Home  Marital  Job  Income  Assets  Debt
  1   4  Home  Marital  Job  Income  Assets  Debt
  1   5  Home  Marital  Job  Income  Assets  Debt
  1   6  Home  Marital  Job  Income  Assets  Debt
  1   7  Home  Marital  Job  Income  Assets  Debt
  1   8  Home  Marital  Job  Income  Assets  Debt
  1   9  Home  Marital  Job  Income  Assets  Debt
  1   10  Home  Marital  Job  Income  Assets  Debt
  2   1  Home  Marital  Job  Income  Assets  Debt
  2   2  Home  Marital  Job  Income  Assets  Debt
  2   3  Home  Marital  Job  Income  Assets  Debt
  2   4  Home  Marital  Job  Income  Assets  Debt
  2   5  Home  Marital  Job  Income  Assets  Debt
  2   6  Home  Marital  Job  Income  Assets  Debt
  2   7  Home  Marital  Job  Income  Assets  Debt
  2   8  Home  Marital  Job  Income  Assets  Debt
  2   9  Home  Marital  Job  Income  Assets  Debt
  2   10  Home  Marital  Job  Income  Assets  Debt
  3   1  Home  Marital  Job  Income  Assets  Debt
  3   2  Home  Marital  Job  Income  Assets  Debt
  3   3  Home  Marital  Job  Income  Assets  Debt
  3   4  Home  Marital  Job  Income  Assets  Debt
  3   5  Home  Marital  Job  Income  Assets  Debt
  3   6  Home  Marital  Job  Income  Assets  Debt
  3   7  Home  Marital  Job  Income  Assets  Debt
  3   8  Home  Marital  Job  Income  Assets  Debt
  3   9  Home  Marital  Job  Income  Assets  Debt
  3   10  Home  Marital  Job  Income  Assets  Debt
  4   1  Home  Marital  Job  Income  Assets  Debt
  4   2  Home  Marital  Job  Income  Assets  Debt
  4   3  Home  Marital  Job  Income  Assets  Debt
  4   4  Home  Marital  Job  Income  Assets  Debt
  4   5  Home  Marital  Job  Income  Assets  Debt
  4   6  Home  Marital  Job  Income  Assets  Debt
  4   7  Home  Marital  Job  Income  Assets  Debt
  4   8  Home  Marital  Job  Income  Assets  Debt
  4   9  Home  Marital  Job  Income  Assets  Debt
  4   10  Home  Marital  Job  Income  Assets  Debt
  5   1  Home  Marital  Job  Income  Assets  Debt
  5   2  Home  Marital  Job  Income  Assets  Debt
  5   3  Home  Marital  Job  Income  Assets  Debt
  5   4  Home  Marital  Job  Income  Assets  Debt
  5   5  Home  Marital  Job  Income  Assets  Debt
  5   6  Home  Marital  Job  Income  Assets  Debt
  5   7  Home  Marital  Job  Income  Assets  Debt
  5   8  Home  Marital  Job  Income  Assets  Debt
  5   9  Home  Marital  Job  Income  Assets  Debt
  5   10  Home  Marital  Job  Income  Assets  Debt
  6   1  Home  Marital  Job  Income  Assets  Debt
  6   2  Home  Marital  Job  Income  Assets  Debt
  6   3  Home  Marital  Job  Income  Assets  Debt
  6   4  Home  Marital  Job  Income  Assets  Debt
  6   5  Home  Marital  Job  Income  Assets  Debt
  6   6  Home  Marital  Job  Income  Assets  Debt
  6   7  Home  Marital  Job  Income  Assets  Debt
  6   8  Home  Marital  Job  Income  Assets  Debt
  6   9  Home  Marital  Job  Income  Assets  Debt
  6   10  Home  Marital  Job  Income  Assets  Debt
  7   1  Home  Marital  Job  Income  Assets  Debt
  7   2  Home  Marital  Job  Income  Assets  Debt
  7   3  Home  Marital  Job  Income  Assets  Debt
  7   4  Home  Marital  Job  Income  Assets  Debt
  7   5  Home  Marital  Job  Income  Assets  Debt
  7   6  Home  Marital  Job  Income  Assets  Debt
  7   7  Home  Marital  Job  Income  Assets  Debt
  7   8  Home  Marital  Job  Income  Assets  Debt
  7   9  Home  Marital  Job  Income  Assets  Debt
  7   10  Home  Marital  Job  Income  Assets  Debt
  8   1  Home  Marital  Job  Income  Assets  Debt
  8   2  Home  Marital  Job  Income  Assets  Debt
  8   3  Home  Marital  Job  Income  Assets  Debt
  8   4  Home  Marital  Job  Income  Assets  Debt
  8   5  Home  Marital  Job  Income  Assets  Debt
  8   6  Home  Marital  Job  Income  Assets  Debt
  8   7  Home  Marital  Job  Income  Assets  Debt
  8   8  Home  Marital  Job  Income  Assets  Debt
  8   9  Home  Marital  Job  Income  Assets  Debt
  8   10  Home  Marital  Job  Income  Assets  Debt
  9   1  Home  Marital  Job  Income  Assets  Debt
  9   2  Home  Marital  Job  Income  Assets  Debt
  9   3  Home  Marital  Job  Income  Assets  Debt
  9   4  Home  Marital  Job  Income  Assets  Debt
  9   5  Home  Marital  Job  Income  Assets  Debt
  9   6  Home  Marital  Job  Income  Assets  Debt
  9   7  Home  Marital  Job  Income  Assets  Debt
  9   8  Home  Marital  Job  Income  Assets  Debt
  9   9  Home  Marital  Job  Income  Assets  Debt
  9   10  Home  Marital  Job  Income  Assets  Debt
  10   1  Home  Marital  Job  Income  Assets  Debt
  10   2  Home  Marital  Job  Income  Assets  Debt
  10   3  Home  Marital  Job  Income  Assets  Debt
  10   4  Home  Marital  Job  Income  Assets  Debt
  10   5  Home  Marital  Job  Income  Assets  Debt
  10   6  Home  Marital  Job  Income  Assets  Debt
  10   7  Home  Marital  Job  Income  Assets  Debt
  10   8  Home  Marital  Job  Income  Assets  Debt
  10   9  Home  Marital  Job  Income  Assets  Debt
  10   10  Home  Marital  Job  Income  Assets  Debt

The following R code will show the imputed values. Columns are imputations, rows are observations.

Code
head(Multiple_Imputation$imp, 10)
$Status
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Seniority
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Home
           1       2       3     4       5       6       7       8       9
30     owner parents    rent other   owner parents    rent   other    rent
240  parents   owner    priv owner    rent parents parents parents parents
1060 parents parents parents  priv parents   owner parents   other parents
1677   owner   owner   owner owner parents   owner   other   owner   owner
2389    rent    rent parents owner   other   other parents    rent    priv
2996   owner   owner    rent owner parents parents parents   owner   owner
        10
30    rent
240   rent
1060  rent
1677  rent
2389  rent
2996 owner

$Time
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Age
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Marital
           1       2       3       4       5       6      7       8       9
3319 married married married married married married single married married
          10
3319 married

$Records
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Job
            1         2     3       4         5         6     7         8
30  freelance freelance fixed   fixed   partime freelance fixed     fixed
912     fixed   partime fixed partime freelance freelance fixed freelance
            9      10
30  freelance partime
912   partime partime

$Expenses
 [1] 1  2  3  4  5  6  7  8  9  10
<0 rows> (or 0-length row.names)

$Income
       1   2   3   4   5   6   7   8   9  10
30    71 120  92 137 151  85 130 132 245 120
114  117 148  62  89 140  65 204  98 155 148
144  151 120 959 250 250 120 230 189 100 254
153  130  99 164  95 117 176  55 156 155 100
158  116 120 265 108 250 210  85 205 130 270
177  178 240 180 230 241 800 230 183 230 241
195  120 310 227 160 112 176 300 103 163 300
206  170 116 100 240 264 223  45  63 154 285
241   80 300 160  60 105 122  60 150 168 100
242  139 320 240 164 100 133 145 121  65 187
278  135  80  80 225  55 150  73 410 199 114
318   71 151 131  60  80  76 117  60  65 120
330  296 200 176 136 130 160 120 147 150 150
333  157 176 113  80 240 189  39 200 200  93
335  234 150 105 117 126 100 210 167  65  95
356   85  71 100 112 254 130 100  52  80 132
360  165 124 170 113  59  88 108 155  71 130
394  350 500 500 150 150 500 500 350 491 491
404  150  85 105  88 107 100 120 142 185 118
422   92 205 108 145 169 205 136 160  70 150
439  260 400 125 152  86 110 115 220 155  92
444  250 190 111 115  67 196 100 138  80 225
462  173 160 242  54  60 210 122 150 102 222
469  112 145 227 178 100 145 176  92 100 112
479  135 148  77 145 145  19  80 105 147 180
481  400 190 138 115 213 116 182  72 131 137
483  154  95  51 198  85  70  79 108 144  95
485  159  80 110 150 142 145 100 170 189 125
496   93  45 146  57  86  80  82  65  50 197
498   78 176 179 148 143  85  75 108 176 100
505   60  66  60  55 218  80  53 120 198 144
567  104  76 150 148 140 120  76 148 122 148
572  158 150 175 120 127  66  76  70  85  95
582   70  56  46  46  75  51  85  70  85  49
648  120 143 430 171 180 120 240 152 144 500
653   75  90  97 120 128  95 200  55 106 110
667  178 183 245 416 241 241 416 416 245 245
675  208 300  68 140 200 150 107 251 180 154
678  130 125 100 101  88 125  76 159 122 135
699   40 123  92 128  50 110  95 157 138  90
708  150 150  65 115  41 179  95 203 160 190
714  160  85 155 105 125  85  70 102 125 194
716   63  86 115 106  90  49  95 106  61 182
733   76 225 102 100 125 105 117 110 115 110
734  160  83 124 194 101 125  81 121 107 128
746  300 105 141  30  81 192  98 105 100 178
777  195  74  94  50  96  75  73 106  57 128
781   61 156 102 241  85 140  70 240 105  75
785  100 250  69  90 160  73 220 191 250 171
804   90 125 100 218  95 224 115 142 126 178
824   60  51  25 136  58  73  57 100 130 111
865  230 136 203 100 154 166 164 126 100 164
866  137  38  90  90 134 159 110 113 120  88
880  145 102 270 300  97 180 100 250  95 214
889  230 250  73 314 310 175 200 200 106 200
906  165 125 133 140  55 166 260 191  92 315
912   60  85  70  91 115 170  93 185 117  81
942   90  60  56 126 158  90  78  73  58  55
952  178 180 180  24 188  91 110 200  40 450
989  119  85 124 106 140  95 137 104 320 129
1001  70  40  37 100  63  85 128  92  70  85
1017 150 160 115 318 159 122 300 172 107  95
1039 160  95  86 110  77  73  87 120  95 128
1044 115  27  60  65  62  92 139  33  67 150
1069 106 141 120 265 144 201 198 260  54 100
1100  67  92  92 219  60  82 170 200  82  69
1111  90  75 120  77 123  75  78  63 130  86
1125 315 175  98 115  90 168 133  65 214 145
1168  60 189 180 250 205 200 298  90 245  66
1208 150 105  66 250 289  79  70  75 185  45
1226 240 133 150 380 161  60 174 148 200 175
1250 140 112  37  80 180  50 100 106 400  33
1257 255  98 207 195 107 140  98 245 114 245
1276 142 198  19 178 213 150 180 200 160 183
1281  67 120  80  79 130  72  75  73  85  51
1289  38 141 120  75 300 103  71  75  25 110
1297  90  97 245 104 187  96 122 101 118  95
1307 182  81  52 120 130 120 157 160 205 470
1314 110 110 178 118 400  70 146  80  90 325
1335 132 197 160  63 229 138 227 112 105  90
1364 172 120 166 121 300 250 275 199 180  96
1365 204 129 222 350  45  50 150  87 251 250
1366  75  65 100 193 202  67 400 126 193 100
1392 150 491 150 350 491 500 150 491 491 350
1421 177 120  97 130 108  43 160 160 107 293
1427 400 167 244 121  61 173  40 142 164 140
1433 125 181 102 130  97 100 146 171  65 166
1436 191 120  52 110  97 101 172 129 120  90
1437  85 210  81 164  80 120 137 125 101 130
1441  57  67 139 110  78  46 102 130 134  94
1456  70 164 147  80 107  67  55 100 200  80
1473 250 356 120 207 183  68  60  99 315 100
1509 143 300 150  70 157 126  25 140 125  60
1513 161 128  95  80  60  63  49 128 128  78
1530 219  85  65  70  40  58  55  79  60  65
1535 100 100 129  68 130 150 111 190 107  75
1536 400 100 195 315 260 144 189 189 160 100
1544  95  75  81  95  76 225 130 147  95  75
1549 102 120 250 185 120 137 128  92 125 106
1564 130  92 232 292  90 225  90 182 100 125
1580 198  41  92 135  68 142  80 115  99  60
1583  70 178  82 148 122 250 110 150  60  75
1598  80 150 160 198 102 125 120 202  55  46
1599 193  71 125 140  50 101 120  47  55 115
1619 115 232 150 131 182 203 105 230  70 164
1629 141 115 130 124 163  83 140  67  80  80
1648 130 128  80  67  60  70 152 130  60  92
1662 117 117  70  60 150 253 125 240 100  77
1677  83 189 193 240 300 110 131 147 131  70
1685 100 159 154 104 128 150 212  63 400 150
1722  54 160 198  62 200 150 193  63  45 207
1724 160 143 263  75 140  80 166 108 112 178
1733 100 143 160 247 290 195  60 214 274 275
1741 150 137  70 101  73 127  80  62 128  50
1745 135 114  39  67 500 246  39 160  60 268
1753  60 150  92  92  50 182 169 150  70  63
1762  70 127  78 110  65  93 125 178 154 110
1766 428 150 145 108  50 165 315 118 183 250
1771 195 110 140 101 125 250 330 260 125 162
1798 144 130 135 190 187 200 100 110 144 110
1802 500 150 150 150 150 491 500 500 491 500
1803 120 100 247 130 192 146 145  85  82  81
1807 160  95  80 120 107 215 160 163  81  76
1811  84 107 105 150 208 150  88 117  95  70
1844 120 180 230 158 220 150 173 112 183 254
1851 152 200 350 200 250 125 250 288 230 191
1852 195  91 120 101  54  93  53 117  96  73
1870  74 100  19 475 500 112 165 120  78 162
1872 157 312  52 115 107 127 113 122 150 110
1882 130 130  55 128  55  75  60 138  92  43
1883 226  67 345 160 105 218 188 137 310 208
1893 500 500 500 150 200 350 500 491 150 905
1898 225  95 165 112 140 129 230 123  56  89
1903  95  67  90  70  67  42 105 101  56  35
1907 250 223 290 133  83  96 185 424 250 173
1920  55 175  60 134  92  60 135 130 222  80
1936 200 190 394 159 296 125 260 211 155  60
1946 140 210  57 190 105 133  80 170 103 170
1948  85  85 150 116 110 150 147 135  70  75
1962  64 100 120 113 115 122 105 120  88 164
1963 394 195 535  99 160 110 274 105  83 130
1965 150  83 148  64  76 110 150 119 125 180
1970 300 100 230 115 230 459 250 315 300 180
1972 500 491 500 500 491 500 500 500 500 491
1977 117 167 199 190  60 200 208 180 125  92
1979 145  75 122 256 150 100 125 107 172 214
1980  92  81  84  86  56  40  90  52 112  85
1984 142 130 148 250 320  65 260 274 251 190
2006 160 140 140 247 225 250 145  73 212 183
2016  93 200 150 150 211 175 120 130 140 195
2022  75 142 183 148 247 100 110 154 150 198
2025 128 155 356 400 157 371 300 170 240 129
2042  60 120 145 220 120 128 324 315 200  98
2043 146 115  90  95 121 253  79 150  94 107
2076  98 116 105 150  85 124  70 106 158 283
2077 150  90 115  65 233 120  79  38 100 158
2083 191 120  55 265 136 250 118  55 126  60
2156 211 155 233  40  25 190 160  90 223 163
2157 122 132  47 126 200 128 150  74  91  64
2186 128  48  79  82 147 100 134 114 235  80
2197 150 260 142 184 102 100  98 139  92  90
2205  55 130 236 210 122  93 165  51 428 158
2218 187 233 125 100  90  68 110  79 208 184
2227  65  85  82 110  58  42 125  48 128  85
2233 130  90  92  55 225 122 139  67 150 110
2240 179 113  61  71  75  70 205 197 355  71
2257 138  77 100  93 138 154  75 118 133 115
2280 144 200 350 166 700 152 166 700 137 350
2291 113 210  85 152 125 230 150 136 160 220
2297 127  77 145 125 150 122 150  84  82 120
2304  85 135  33 121  45 165  49  53  48  49
2310 100 125 189 150 106 115  50 271 160 320
2323  95  26  70  90  70  71 140 126 101  70
2331 905 500 491 200 905 500 150 500 905 491
2337  65 115  77 137  70  85  90 129 117 335
2349 303 160 384 110 214 106 208 189  46 214
2365  61  68 100 138 122  80 240 310 500 250
2369 324 175 120 150 107 135 106  95 163 135
2387  40 198 120 101  71 139 106  73  60 128
2396 150 162 100  91 208  53 113 237 128  33
2399 144 212 245 170  84 235 138 160 100 161
2402  79 200 245 158 300 120 109 294 290 172
2404 149 150 150 176 195 140 175 127 160  60
2437 250 300 195 115 275 144 315 459 120 120
2445 187 345 131 150 125 110 164 100 207 120
2446 130  95 135 104 150  60 333 163 300 100
2453  60 131  95  90 183 113 120 103 220  81
2460 128  92 135 221  60  30  66 128  70  60
2467  58  70 115  85  50  40 120  72  70  70
2473 148 105  95 215 124  97 100 180  51  59
2490  85 165  35  72 115  67 102  75  35  85
2495  53  70  72 130  56  95 110  63 126  78
2505  92 142   6 150  78 120 107 300 264 107
2566 110 128 100  71  80 130 160  63  91  77
2572  85 117 254  55 107 115 106  50 109  90
2578 208  85 103 131 110 131 164  87  83 218
2584  62 125 125  67 111 133  53 178  82 113
2596 129 102  66  85 113  91 130  87  80 162
2605 130 115  90  80  60  80  58  70 121 209
2614 140 300 120 138 102  42 175  54 242 180
2624 174  55 110 105 142 110 116 148  43  39
2625 167 100  77 161 110 142 173 181 100 124
2631  63 149 144 147 150 175 155 150 190 324
2632 118  86  92 145 210  70 285 220 168 123
2651  92 105  95 125 108 151 150 161  64 150
2652  50  86 215  88  78 267 132 140 140 116
2653 231 219 101 199 170  80 100 126 200 216
2668 150 150  92  68 112 130 133 100 250 176
2676 211 166 145 240  85 200 240 136 100 125
2681 100 110 114 128 147 121 163  80 107 113
2683  80  79 120 151  25  87 150  81 164  80
2695 182 210 200 100 117 196 120 114 180 150
2696  65 123  67 125  60  60  48  70  83  75
2707 200 129 185  90 125 151  81 190 122  89
2720 110 210 107 121 210 260 170 166 155 274
2723 106  90 135  82  92 130  90 107  70  86
2725 100 250 400 275 250 150  65 250 314 313
2730 120 180 201  65 101 125  68 100 168  92
2769  80  79  85 129 115  72  51 107  90  88
2780  72  63  63  70  96  42 126  32  72  63
2781 161 138 113 320 146 210 145 160 144  70
2802 135  92 115 237 130 180  90 105 169 107
2805  90  75 175 390 125 100 260 350 426 124
2806 160  73 160 125  55 134 134  70 107 100
2807 160 150 120 202  50  89  57 122  73 129
2810 117 180 110  81 149  62 116 306 176 125
2813 166 115 120 143  80 200 187 120 100 150
2815 111  75 140  87 137 115  67 155  90  80
2825 384 606 100 130 155 168 380  85 380  82
2854  96 300 107 300 150 300 222 140 857 130
2869 160 251 130 133  75  42 154  90 105 212
2882  60  87 106 147  74  47  80  75 130  69
2884 120  54  80 133 144 145  64  58  80 215
2893 210 164  42 170 211 135  56 125  64 135
2915  50  70 185 131 120 115 122 200  96 108
2927  83 130 185  78  89  75 147 180 120 105
2935  92 158  75  92  75  70 192 130 100  40
2936 135  80 466 112 318  70 104 120 289 213
2939 131 153  71 232 230 122 100 150  50 120
2951 300 240 178 100 230 183 905 416 150 241
2954 125 137 300 250 200 300 166 250 125 300
2969 140 161 190  95 107  70 166 222 121 187
2971 400  53 114 340 470  54  81 340  25 135
2979 254 120 180 120 180 100 100 120 142 210
2983  72  90 126 117  85 130 100 138  87 170
2991  90 120 113 105 150 187  90 173  76  49
2996 122 150 134 140 200 124 148 171 221 200
2999  83 140 300 120 113 345 110  93 145 138
3008 208 350  50 700 152 200 180 700 700 250
3014 170  98 100 178  93  64 156 185 410  90
3021  96 126 127 199 192 186  76 270 125 128
3026 125 140 107 155 119 165  39 130  45 440
3031 120  65  70  78  83  85 105  92  22  52
3038  42  49  49  67  53 121  67 121 121  67
3040 266 175 130 175 130 371 190 158 857 300
3069 251 200  95 121 235 426 300  78 170 142
3080 150 100 105 283 120 129  60 180  73  65
3096 247  89 120  90 130 131 128 129 181  46
3104  74 170 158  65  85 181 110 120 163  80
3106 352 203 136 146 200 190 246  72 274 120
3110  73 215 192 245 294 254 271 180 464 100
3121 300 294  67 390 254 310  90 359  90 233
3123 130 538 155 110 233 300  42 120  91 244
3139 175 321 230 260 300 275 350 189 250 250
3167 120  90 142 185 189 133  75  79  80 196
3170 150 234 148 120  80 116  90 107 126 100
3183  64 121 181 132  86 140 222 106  61 182
3185 200 106 140 160 205 100 155 318 173  70
3187  90 178 125  95  50 160  94  94 150 160
3203 133 130  60  97  78 151 115 135 142 228
3218  73 150 160 199 188 164 103 107 132  85
3222 154  90 155 157 160 160 200 130 333 100
3229  91 160 185 260  96 250 155 125 122  70
3233 118  68 320 290 230 250 120 400 120  55
3237 250 118  85 146 143 141 105 113 160  88
3245 500 185  68 100 112 112 113 140 164 100
3252  70 197 120 102  65  91  93 165 100  66
3266 180 110 101 161 107 110  60 210 147 130
3286 120 100 152  60 125 141  55 100 109 100
3288 217 225 200 110 114 123 229 110  63 119
3304 500 491 150 491 241 500 905 491 905 905
3310 100  65 100  75 500 200  90 142 200 350
3316 200 150 105  96 153 150 115  75  93 156
3325 172  60  92 205 104  90 134 117  81  59
3336 275 225 285 350 125 200 198  95 181 100
3338 183 416 416 200 245 178 416 241 800 230
3345  90 160  75 211 189 110 170 146 211 157
3352 143 163  75 211 318 197 200 125 143  80
3365 182  73 297  81 130  54  55  75 132  79
3382 139 225 115 118 225 143 150 140 144 110
3433 120  50  90  60 139 135  81  65  55  80
3439 221  87 137  59  92 120  40 221  57 100
3451 100  73 130  72  70 139  78  66  78  70
3452  95  75 100 161 121  52  75  90  47 140
3454  72 110  57  89 150 131 350 130  71  83
3456 178  95 120 217  67 120  90 106 117 105
3461 164 140 125 140 150 110 189 135 160 125
3462 135 280 125 265 148 164 150 129 163 216
3473 200 141 125 110 112 110  80  90  60 198
3477 131  92 137 160 247 142 184  77 150 115
3478 121 240 400  69 146 150 140 100 426 140
3494  75  70 116  59  58 116 200  87 218  52
3513  64 150  80 235 115 108  64 120  75 149
3523 233  93 155 120  69 180 171 131 140 125
3525 260  80 196 205  93 180 163 290  80  82
3534  95 175 125 115 140 100 250 222 204 145
3556 139 213 125 260 195 384 478 300 195 186
3641  65 320 240 325 176 260 112 125 208 160
3645 235 140 111 203  63 110 105  85 104 129
3657  93 166 148  56  90  60  80  92  79  63
3674  60  75 134  72 117  70 195 141  57  85
3679 183 114  89 125 200 114 245 110 100  50
3691  92 200 130 100 114 120 210 112 142 145
3704 122  80 124  31  98 145 114 130 120 199
3709 250  67 167  52 217 154 197 130 300 200
3714  77  70  70  75  88  62  90 136  53 335
3717  88  65 140  95 111 115  60  76 103  63
3730  59  93 110 118 180  51  95 150  75 120
3740 227 128 200 120  90  75 100 124 141 110
3763 125 110 340 131  79  70  17  92 125 106
3768  58 177 175 265 105 155 116 100 200 108
3773 300 350 300 383 464 320 464 125 321 200
3794 125 130 190 110  61 190 237 197  54 180
3800  72 105 101 149  63 204  75  74 165  88
3823 140 145 136  90 199 115 100  42  87  64
3825  53  67 121 121 121  53  49  33  33  49
3850 122 120 122  90  63 107 415 120  80  65
3855  63 134 250 208 140 340 200 125 174 225
3857 122  30  70 119 100  50  45 119  61  72
3858  60  70 128  66 110 110 198  93  73  90
3882  83 140 175  96 120  84  73 170  91 350
3887 140  93 208  95 290 100 130 135 102 150
3892 220 355 195  71 230 201 208  45 210  78
3902 114  63 189 255  84 150 290 137 152 110
3914  82  74  38  67  99  92  87 150  78 130
3928 275 189 430 350 195 180  94 160 300 350
3932 108 100 100 137 143 125 150 100  97 110
3945 205 114 139  85 247 154 125 116 104 150
3946 130 247 122 314 109 166 205 144 191 144
3947  70  49  70 182  70 120  70  72  70  82
3951  75 194 179 180 218  78 150 120 108 130
3955  50 110 135  60  50 202  75  88  73  75
3966 178 115  84  98 109  87  90 111 175 170
3992 102 130 150 185 120 130 100  67  60 112
4003  81 150 196 110  85 106 173 132 200 159
4023 178  79 160 211 230 205 700 146 228 150
4036  65  90 113  66 154  57  25 146  80  79
4049 121  56  49  53 121  75  88  49  33  85
4064  95 120  84 128  85 139 117 114 110 117
4069 142 147 157 115 147  45 250  75 195 150
4076  75  53 121  49  49 162  33 162  56  53
4082 450 107 113 157  60 140 100 190  90 108
4085  45 240 114 300 110 150  45 371 126 115
4096 145  80 125  70  90 133 143 212  96 165
4119 160 156 139 115 110  57  30 210  86  55
4159 150 208 135  60 119  98 349 122  75 182
4168  87 141 100 110 200 140 100  74  92  65
4173  90 105  75  40  70 128 221  93  82 126
4181 200 165  53 109  80 120 210 170  83 100
4191  80 213 124 150  94 100 110 141 200 179
4198 250 126 320 300 100 275 390 360 200 411
4199  54  77 112 156  98 318 180 180 110 173
4222 130  72  90  60 215  63  92  56  70  78
4223 158  70  93  64  42  65  80 219 167 138
4237 111 155 256 123  69  90 122  62  67 130
4246 142  90  68 306 110  90  60  55 110  92
4247  60  98  80  72  78  96 144  70  90  66
4256 150  50 300 250 208 300 250  87 146  68
4281  60  90  65 335  86  66 105 115  60  63
4295  75  62 124 194  45  88 105 134 100 106
4333  63  73 139  78 169 120 130  84  72 135
4349 200 196 115 240 426 157 169 155 135 141
4368 130 108 105 117 283 101  27 108  80  80
4373  75 175  65  70 101  85 188  85 172 160
4398 135  50  92 325 146 200 130  70 160 250
4411 100 300 160 245 315 120  62  92 250 182
4420 150 500 491 491 500 491 500 150 500 500
4433 114 145 140 100 300 192 130 100 155  92
4436  92 160 185  60 100  60 174 100 125 117
4440 150 132 280 149  95  40 238 120  63  98
4441 290 131 161 183 296 183 125 133 214 230

We can check the quality of the imputations by running a strip plot, which is a single axis scatter plot. It will show the distribution of each variable per imputed data set. We want the imputations to be values that could have been observed had the data not been missing.

Code
par(mfrow=c(7,2))
stripplot(Multiple_Imputation, Status, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Seniority, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Home, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Time, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Age, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Marital, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Records, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Job, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Expenses, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Income, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Assets, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Debt, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Amount, pch = 19, xlab = "Imputation number")

Code
stripplot(Multiple_Imputation, Price, pch = 19, xlab = "Imputation number")

Next, we will pool the results of the complete dataset with the imputed dataset to arrive at estimates that will properly account for the missing data. We fit the complete model with the with() function and display the summary of the pooled results. It will give us the estimate, standard error, test statistic, degrees of freedom, and the p-value for each variable.

Code
# fit complete-data model
fit <- with(Multiple_Imputation, glm(Status ~ Seniority + Home + Time + Age + Marital + Records + Job + Expenses + Income + Assets + Debt + Amount + Price, family = binomial))

# pool and summarize the results
summary(pool(fit))
               term      estimate    std.error    statistic        df
1       (Intercept)  1.000072e+00 7.333866e-01   1.36363518 4086.1682
2         Seniority  8.307737e-02 7.454796e-03  11.14415122 4392.3238
3         Homeother  6.182192e-02 5.726574e-01   0.10795621 4373.7006
4         Homeowner  1.153055e+00 5.581845e-01   2.06572343 4411.1415
5       Homeparents  9.338966e-01 5.670622e-01   1.64690319 4381.7089
6          Homepriv  4.280971e-01 5.756352e-01   0.74369521 4410.1427
7          Homerent  4.120033e-01 5.614812e-01   0.73377929 4401.8470
8              Time -2.817114e-04 3.477564e-03  -0.08100826 4191.6466
9               Age -1.088836e-02 4.994322e-03  -2.18014705 4133.3853
10   Maritalmarried  6.047650e-01 4.180067e-01   1.44678300 4197.5359
11 Maritalseparated -6.783140e-01 4.624775e-01  -1.46669624 4221.1376
12    Maritalsingle  1.599798e-01 4.236380e-01   0.37763332 4183.9203
13     Maritalwidow  1.680857e-01 5.277448e-01   0.31849802 4336.1022
14       Recordsyes -1.783863e+00 1.020700e-01 -17.47686040 4250.4947
15     Jobfreelance -7.627158e-01 1.017639e-01  -7.49495736 4251.6598
16        Jobothers -7.035646e-01 2.018017e-01  -3.48641453 4404.0341
17       Jobpartime -1.475815e+00 1.258215e-01 -11.72942735 4397.6722
18         Expenses -1.508892e-02 2.636234e-03  -5.72366383 3134.7955
19           Income  7.021486e-03 7.549578e-04   9.30050160  208.2641
20           Assets  2.041218e-05 6.455763e-06   3.16185408  363.8802
21             Debt -1.607499e-04 3.604554e-05  -4.45963310  335.9344
22           Amount -1.934283e-03 1.717728e-04 -11.26070778 3999.3868
23            Price  8.747658e-04 1.263752e-04   6.92197277 4173.7383
        p.value
1  1.727575e-01
2  1.835445e-28
3  9.140354e-01
4  3.891287e-02
5  9.964965e-02
6  4.571005e-01
7  4.631223e-01
8  9.354393e-01
9  2.930279e-02
10 1.480324e-01
11 1.425332e-01
12 7.057222e-01
13 7.501225e-01
14 4.190340e-66
15 8.023279e-14
16 4.943215e-04
17 2.624321e-31
18 1.140839e-08
19 1.968912e-17
20 1.699297e-03
21 1.121017e-05
22 5.574744e-29
23 5.134169e-12

We now have a full, complete dataset that we can analyze!

Conclusion

In conclusion, missing data can occur in research for a variety of reasons. It is never a good idea to ignore it. Doing this will lead to biased estimates of parameters, loss of information, decreased statistical power, and weak reliability of findings (Dong and Peng 2013). The best course of action is to impute the missing data by using multiple imputation. When missing data is discovered, it is important to first identify it and look for missing data patterns. Next, define the variables in the dataset that are related to the missing values that will be used for imputation. Create the necessary number of complete data sets. Run the models and combine them using the imputed values, and finally, analyze the complete dataset. Performing these steps will minimize the adverse effects caused by missing data on the anaylsis (Pampka, Hutcheson, and Williams 2016).

References

Arnab, R. 2017. Survey Sampling Theory and Applications. Academic Press. https://www.sciencedirect.com/topics/mathematics/imputation-method.
Azur, M. J., E. A. Stuart, C. Frangakis, and P. J. Leaf. 2011. “Multiple Imputation by Chained Equations: What Is It and How Does It Work?” Int J Methods Psychiatr Res. 20 (1): 40–49. https://onlinelibrary.wiley.com/doi/epdf/10.1002/mpr.329.
Cao, Y., H. Allore, B. V. Wyk, and Gutman R. 2021. “Review and Evaluation of Imputation Methods for Multivariate Longitudinal Data with Mixed-Type Incomplete Variables.” Statistics in Medicine 41 (30): 5844–76. https://doi-org.ezproxy.lib.uwf.edu/10.1002/sim.9592.
Dong, Y., and C. J. Peng. 2013. “Principled Missing Data Methods for Researchers.” SpringerPlus 2 (222). https://doi.org/10.1186/2193-1801-2-222.
Mainzer, R., M. Moreno-Betancur, C. Nguyen, J. Simpson, J. Carlin, and K. Lee. 2023. “Handling of Missing Data with Multiple Imputation in Observational Studies That Address Causal Questions: Protocol for a Scoping Review.” BMJ Open 13: 1–6. http://dx.doi.org/10.1136/bmjopen-2022-065576.
Pampka, M., G. Hutcheson, and J. Williams. 2016. “Handling Missing Data: Analysis of a Challenging Data Set Using Multiple Imputation.” International Journal of Research & Method in Education 39 (1): 19–37. https://doi.org/10.1080/1743727X.2014.979146.
Pedersen, A. B., E. M. Mikkelsen, D. Cronin-Fenton, N. R. Kristensen, T. M. Pham, L. Pedersen, and I. Petersen. 2017. “Missing Data and Multiple Imputation in Clinical Epidemiological Research.” Clinical Epidemiology 9: 157–66. https://www.tandfonline.com/doi/full/10.2147/CLEP.S129785.
Schafer, J. L., and J. W. Graham. 2002. “Missing Data: Our View of the State of the Art.” Psychological Methods 7 (2): 147–77. https://psycnet.apa.org/doi/10.1037/1082-989X.7.2.147.
Streiner, D. L. 2008. “Missing Data and the Trouble with LOCF.” EBMH 11 (1): 1–5. http://dx.doi.org/10.1136/ebmh.11.1.3-a.
Thongsri, T., and K. Samart. 2022. “Composite Imputation Method for the Multiple Linear Regression with Missing at Random Data.” International Journal of Mathematics and Mathematics and Computer Science 17 (1): 51–62. http://ijmcs.future-in-tech.net/17.1/R-Samart.pdf.
van_Buuren, K., Groothuis-Oudshoorn. 2011. “Mice: Multivariate Imputation by Chained Equations in r.” Journal of Statistical Software 45: 1–67. https://doi.org/10.18637/jss.v045.i03.
van_Ginkel, J. R., M. Linting, R. C. Rippe, and A. van der Voort. 2020. “Rebutting Existing Misconceptions about Multiple Imputation as a Method for Handling Missing Data.” Journal of Personality Assessment 102 (3): 2812–31. https://doi.org/10.1080/00223891.2018.1530680.
Wongkamthong, C., and O. Akande. 2023. “A Comparative Study of Imputation Methods for Multivariate Ordinal Data.” Journal of Survey Statistics and Methodology 11 (1): 189–212. https://doi.org/10.1093/jssam/smab028.
Wulff, J. N., and L. E. Jeppesen. 2017. “Multiple Imputation by Chained Equations in Praxis: Guidelines and Review.” Electronics Journal of Business Research Methods 15 (1): 41–56. https://vbn.aau.dk/ws/files/257318283/ejbrm_volume15_issue1_article450.pdf.